Integrating Many Techniques for Discovering Structure in Data

نویسندگان

  • Dawn E. Gregory
  • Paul R. Cohen
چکیده

This paper describes a formal representation of the discovery process that e ciently integrates of any number of data analysis strategies, regardless of their similarities and di erences. We have implemented a system based on this formalization, called the Scientist's Empirical Assistant (SEA). SEA employs several analysis strategies from the discovery literature, including techniques for function nding, causal modeling, and Bayesian conditioning. It uses high-level knowledge about the discovery process, the strategies, and the domain of study to coordinate the selection and application of analyses. It relies on the skills and initiatives of an expert user to guide its search for structure. Finally, it designs and runs experiments with a simulator to verify its ndings. SEA is currently capable of performing a full cycle of discovery and analysis, from hypothesis formulation to experiment design, data collection, analysis, and the generation of explanatory hypotheses. In addition, the formalization on which it is based provides an environment for studying the how, what, why, and when of intelligent data analysis. Integrating Many Techniques for Discovering Structure in Data Abstract. This paper describes a formal representation of the discovery process that e ciently integrates of any number of data analysis strategies, regardless of their similarities and di erences. We have implemented a system based on this formalization, called the Scientist's Empirical Assistant (SEA). SEA employs several analysis strategies from the discovery literature, including techniques for function nding, causal modeling, and Bayesian conditioning. It uses high-level knowledge about the discovery process, the strategies, and the domain of study to coordinate the selection and application of analyses. It relies on the skills and initiatives of an expert user to guide its search for structure. Finally, it designs and runs experiments with a simulator to verify its ndings. SEA is currently capable of performing a full cycle of discovery and analysis, from hypothesis formulation to experiment design, data collection, analysis, and the generation of explanatory hypotheses. In addition, the formalization on which it is based provides an environment for studying the how, what, why, and when of intelligent data analysis. This paper describes a formal representation of the discovery process that e ciently integrates of any number of data analysis strategies, regardless of their similarities and di erences. We have implemented a system based on this formalization, called the Scientist's Empirical Assistant (SEA). SEA employs several analysis strategies from the discovery literature, including techniques for function nding, causal modeling, and Bayesian conditioning. It uses high-level knowledge about the discovery process, the strategies, and the domain of study to coordinate the selection and application of analyses. It relies on the skills and initiatives of an expert user to guide its search for structure. Finally, it designs and runs experiments with a simulator to verify its ndings. SEA is currently capable of performing a full cycle of discovery and analysis, from hypothesis formulation to experiment design, data collection, analysis, and the generation of explanatory hypotheses. In addition, the formalization on which it is based provides an environment for studying the how, what, why, and when of intelligent data analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Proposed Data Mining Methodology and its Application to Industrial Procedures

Data mining is the process of discovering correlations, patterns, trends or relationships by searching through a large amount of data stored in repositories, corporate databases, and data warehouses. Industrial procedures with the help of engineers, managers, and other specialists, comprise a broad field and have many tools and techniques in their problem-solving arsenal. The purpose of this st...

متن کامل

Integrating AHP and data mining for effective retailer segmentation based on retailer lifetime value

Data mining techniques have been used widely in the area of customer relationship management (CRM). In this study, we have applied data mining techniques to address a problem in business-to-business (B2B) setting. In a manufacturer-retailer-consumer chain, a manufacturer should improve its relationship with retailers to continue its business. Segmentation is a useful tool for identifying groups...

متن کامل

An introduction to methods of discovering and identifying ancient sites with emphasis on evidence and geomorphologic techniques

Recognizing of position of ancient sites, it is of the great help to archaeologist. After this recognition, the archaeologist with rely on the knowledge and usual techniques in archaeology can determine the range of sites. After the discovery of this information, the archaeologist can get the information about the social, economic, livelihood and political of the past of sites. In this researc...

متن کامل

Combining data envelopment analysis and multi-objective model for the efficient facility location–allocation decision

This paper proposes an innovative procedure of finding efficient facility location–allocation (FLA) schemes, integrating data envelopment analysis (DEA) and a multi-objective programming (MOP) model methodology. FLA decisions provide a basic foundation for designing efficient supply chain network in many practical applications. The procedure proposed in this paper would be applied to the FLA pr...

متن کامل

Strategic Cost-Cutting in Information Technology: toward a Framework for Enhancing the Business Value of IT

The increasing dependency of many businesses with information technology (IT)and the high percentage of the IT investment in all invested capital in businessenvironment ask for more attention to this important driver of business. Thelimitation of capital budget forces the managers to look for more wise investment inIT. There are many cost-cutting techniques in the literature and each of them ha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997